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Abstract. We discuss the large scale properties of standard cold dark matter cosmological models characterizing 
the main features of the power-spectrum, of the two-point correlation function and of the mass variance. Both 
the real-space statistics have a very well defined behavior on large enough scales, where their amplitudes become 
smaller than unity. The correlation function, in the range < ^(r) < 1, is characterized by a typical length-scale 
rc, at which ^{vc) — 0, which is fixed by the physics of the early universe: beyond this scale it becomes negative, 
going to zero with a tail proportional to — (r~*). These anti-correlations represent thus an important observational 
challenge to verify models in real space. The same length scale rc characterizes the behavior of the mass variance 
which decays, for r > Vc, as r~*, the fastest decay for any mass distribution. The length-scale rc defines the 
maximum extension of (positively correlated) structures in these models. These are the features expected for the 
dark matter field: galaxies, which represent a biased field, however may have differences with respect to these 
behaviors, which we analyze. We then discuss the detectability of these real space features by considering several 
estimators of the two-point correlation function. By making tests on numerical simulations we emphasize the 
important role of finite size effects which should always be controlled for careful measurements. 
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1. Introduction 

, In contemporary cosmological models the structures ob- 
' served today at large scales in the distribution of galaxies 
[ in the universe are explained by the dynamical evolution 
of purely self-gravitating matter (dark matter) from an 
initial state with low amplitude density fluctuations, the 
latter strongly constrained by satellite observations of the 
fluctuations in the temperature of the cosmic microwave 
background radiation. The other main observational ele- 
ments for the understanding of the large scale structure of 
the universe is represented by the studies of galaxy corre- 
lations. Any theoretical model aiming to explain the for- 
mation of structures must be tested against the data pro- 
vided by galaxy surveys which give the important bridge 
between the regimes characterized by large and small fluc- 
tuations. 

Models of the early universe (see e.g. Padmanabhan, 
1993 and references therein) predict certain primordial 
fluctuations in the matter density field, defining the cor- 
relations of the initial conditions, i.e. at the time of de- 
coupling between matter and radiation. In the regime 
where density fluctuations are small enough, the correla- 
tion function of the present matter density held is simply 



related to one describing the initial conditions. In fact, ac- 
cording to the growth of gravitational instabilities in an 
expanding universe in the linear regime perturbations are 
simply amphfled (see e.g., Peebles, 1980 and references 
therein). Thus today at some large scales where the cor- 
relation function is still positive but with ^(r) < 1 the 
imprint of primordial fluctuations should be preserved. In 
the region of strong non-linear fluctuations an analytical 
treatment to predict the behavior of the two-point corre- 
lation function has not been developed yet and, in general, 
one makes use of numerical simulations which provide a 
rich, but phenomenological, description of structure in the 
non-linear regime. It is in this regime, at small enough 
scales, where most observations have been performed un- 
til now. 

We focus here on the type of correlations predicted 
in the linear regime by models of the early universe. 
While the characterization of correlations is usually done 
in terms of the power-spectrum of the density fluctuations 
a real space analysis turns out to be useful to point out 
some relevant features from an observational point of view 
(see, e.g., the discussion in Gabrielh et al., 2004). 

Theoretical models of primordial matter density fields 
in the expanding universe are characterized by a single 
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well-defined length scale, which is an imprint of the physics 
of the early universe at the time of the decoupling between 
matter and radiation (see e.g. Bond and Efstathiou 1984, 
and Padmanabhan 1993 for a general introduction to the 
problem). The redshift characterizing the decoupling is di- 
rectly related to the scale at which the change of slope of 
the power-spectrum of matter density fluctuations P{k) 
occurs, i.e. it defines the wavenumber kc at which there is 
the turnover of the power-spectrum between a regime, at 
large enough k, where it behaves as a negative power-law 
of the wave number P{k) ^ fc™ with — 1 < m < —3, and 
a regime at small k where P{k) ^ fc as predicted by infla- 
tionary theories. Given the generality of this prediction, 
it is clearly extremely important to look for this scale in 
the data. 

The exact location of this scale is related to several 
parameters, including the cosmological ones which de- 
scribe the geometry of the universe at large scales (see e.g. 
Padmanabhan 1993, Tegmark et al. 2004 and Spergel et al. 
2007 for a recent determination). We discuss in what fol- 
lows that the scale Vc corresponding to the wave-number 
kc, in a particular variant of Cold Dark Matter (CDM) 
models — the so-called ACDM vanilla model — is pre- 
dicted to be Tc ~ 124 Mpc/l|l]. At this scale the real 
space correlation function crosses zero, becoming nega- 
tive at larger scales. In particular the correlation function 
presents a positive power-law behavior at scales r <ti rc 
and a negative power-law behavior at scales r ^ r^. 
Positive and negative correlations are exactly balanced in 
way such that the integral over the whole space of the 
correlation function is equal to zero. This is a global con- 
dition on the system fluctuations which corresponds to the 
fact that the distribution is super-homogeneous (or hyper- 
uniform), i.e. characterized by a sort of stochastic order 
and by fluctuations which are depressed with respect to, 
for example, a purely uncorrelated distribution of matter 
(Gabrielli, Joyce and Sylos Labini, 2002 — see discussion 
below). 

Note that the scale Tc marks the maximum extension 
of positively correlated structures: beyond Tc the distribu- 
tion must be anti-correlated since the beginning, as the 
evolution time was not sufficient for the positive correla- 
tions to be developed. Thus this scale can be regarded as 
an upper limit to the maximum size of structures (with 
large of weak correlations) in the present universe. The 
possible discoveries of structures of larger size is still a 
challenging task for observational cosmology. 

A relevant problem for the measurements of small 
amplitude values of the correlation function, i.e. when 
,f(r) < 1, is represented by the characterization and the 
understanding of both the systematic biases which may af- 
fect the estimators of <^(r) and the stochastic noise which 
perturbs any real determination. A study of this prob- 
lems can be found, for example, in Kerscher (1999) and 

^ For seek of clarity we have chosen the scale of distances 
normalized to the adimensional Hubble parameter h, which is 
defined from the Hubble's constant Hq = lOO/i km/sec/Mpc. 



Kerscher et al. (2000) where it is shown that in general the 
biases in several estimators of the two-point correlation 
function are not negligible. In particular when there are 
structures of large spatial extension inside a given sample 
there can be non negligible biases affecting the determi- 
nation of two-point properties. We focus here on the sys- 
tematic bias related to the effect of the so-called integral 
constraint, which distorts any estimator of the correlation 
function at large scales in any given sample. The integral- 
constraint represents an overall condition on any estimator 
of the correlation function which is due to the fact that 
the average density, estimated in any given sample, is in 
general different from its ensemble average value. 

Here we treat explicitly the case for the simplest esti- 
mator of the two-point correlation function, the so-called 
full-shell or minus estimator and and we illustrate the sit- 
uation for the other estimatorsby studying artificial dis- 
tributions. In particular we devote most attention to the 
estimator introduced by Davis and Peebles (1983), which 
is still very used in the literature, and to the estimator in- 
troduced by Landy and Szalay (1993), which is the most 
popular one. Kerscher et al. (2000) considered also other 
estimators, like the Hewett estimator (Hewett, 1982) and 
the Hamilton estimator (Hamilton, 1993) and have shown 
that the results obtained with he Landy and Szalay es- 
timator are almost indistinguishable from the Hamilton 
estimator. 

In this way we will be able to identify the problems re- 
lated to the identification of correlations above the men- 
tioned scale rc'. we will then propose several tests to be 
applied to the galaxy data, in order to define the strategy 
to study the correlation function at small amplitudes and 
larger distances in order to eventually detect the length 
scale Vc- 

Up to now studies of the correlation function ^(r) 
in galaxy samples have been limited to small scales, i.e. 
0.1 < r < 30 Mpc/h (i.e. Totsuji & Kihara, 1969, Davis 
and Peebles, 1983, Davis et al., 1988, Benoist et al., 1996 
Park et al., 1994, Scranton et al., 2002, Zehavi et al., 2002, 
Zehavi et al., 2004, Ross et al., 2007) and only recently the 
volume covered by galaxy redshift samples is approaching 
a size which is large enough to make a robust estimation 
of the correlation function at scales of order 100 Mpc/h. 
When the Sloan Digital Sky Survey (SDSS) (York et al., 
2000) will be completed by filling up the gap between the 
two main angular regions of observations, which are nowa- 
days disjointed, the volume of the survey and the statistics 
of the number of objects in the samples would be large 
enough to test space correlations on scales of order or 
more. An exception to this situation is represented by the 
paper by Eisenstein, et al., (2005), who, by studying a 
sample of Luminous Red Galaxies (LRG) of the SDSS, 
have estimated the correlation function on scales of or- 
der 100 Mpc/h. These authors have however focused their 
attention to another real space feature of theoretical mod- 
els: the so-called "bump" of the correlation function which 
corresponds in real space to the so-called Doppler peaks 
in the matter power-spectrum generated by the baryonic 
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acoustic oscillations in the early universe. As we discuss 
below this bump, corresponding to a singular point of the 
correlation function (GabrieUi et al., 2004), is localized at 
scales of order of 100 Mpc/h and characterized by a small 
amplitude. This is a second important real-space scale of 
the theoretical correlation function which is localized at a 
scale slightly smaller than Vc- The detection of the bary- 
onic bump is thus related to the detection of the scale Tc 
as any finite-size effect perturbing the determination of 
the scale rc will, inevitably, also affect the determination 
of the baryonic bump. In fact the baryonic bump can be 
seen as a small modification to the overall shape of the cor- 
relation function at scales of order to which we focus 
our attention here. 

Note that, because of the very large scales, the acoustic 
signature and the zero point scale remain in the linear 
regime even today and they are weakly affected by non- 
linear effects (see Eisenstein et al., 2006). Thus real space 
and redshift space properties, at such large scales, should 
not differ substantially. 

In Section 2 we introduce the basic definitions of 
the statistical quantities usually employed to character- 
ize two-point properties in real and Fourier space. In 
Section 3 we discuss a simple functional behavior of the 
power-spectrum of matter density fluctuations which cap- 
tures the main elements of a more realistic CDM power- 
spectrum. We discuss the real-space properties as repre- 
sented by the two-point correlation function and we con- 
sider the problem of selection or biasing in the simplest 
theoretical scheme of biasing a correlated Gaussian field. 
In Section 4 we treat explicitly the case of a ACDM matter 
density field characterizing in detail real space properties. 
The main estimators of the two-point correlation function 
are discussed in Section 5 and in Section 6 we test these 
estimators in artificial distributions. Finally in Section 7 
we draw our main conclusions discussing the problems re- 
lated to the estimations of two-point correlations in real 
galaxy samples. 



2. Basic definitions 

The microscopic number density function for any particl^ 
distribution is given by 



N 



i(x) ^^Sd (x- Xi) 



(1) 



i=l 



where is the position of the i-th particle, 5d is the Dirac 
delta function and the sum is over the N particles of the 
system. 



^ We make explicit the fact that we consider particle distri- 
butions. However most of the definitions given hereafter can 
be easily extended to the of a continuous matter density field. 
We refer to GabrieUi et al., (2004) for more details. 



For a system in which the mean density hq is well 
defined and positive, it is convenient to define the density 
contrast: 



<5(x) 



n(x) - no 



no 



(2) 



In order to characterize the two-point correlation prop- 
erties of the density fiuctuations, one can then use the 
reduced two-point correlation function (hereafter simply 
two-point correlation function): 



e(r) = (<5(x + r)J(x)) , 



(3) 



where (...) is the ensemble average, i.e., an average over 
all possible realizations of the system. In a distribution of 
discrete particles ^(r) always has a Dirac delta function 
singularity at r = 0, which it is convenient to separate by 
defining ^(r) for r 7^ (the "off-diagonal" part — see e.g. 
Peebles 1980) 



e(r) = -<5z,(r)+e(r) 
no 



(4) 



The normalized variance of particle number (or mass) 
is an integrated quantity defined as : 



{N{r)Y 



(5) 



where N{r) is the number of particles inside, for exam- 
ple, a sphere of radius r. Then (T^(r) can be used, in a 
manner similar to ^(r), to distinguish a regime of large 
fluctuations [a^ > 1) from a regime of small fluctuations 
where cr^ < 1. It is simple to find the explicit expression 
for the normalized variance of particle number in terms of 
a double integral of ^(r) (see, e.g., Peebles, 1980) 



^'(^) = J^JJ{\ri-r2\)d\,d\2 . (6) 

If we consider distributions which are periodic in a 

cube of side L, we can write the density contrast as a 
Fourier series: 



^i^) = JjYl <^xp(ik • x) (5(k) 

k 



(7) 



with k e {(27r/i)n|ne Z^}. The coefficients 5(k) are 
given by 

S{k) = / 5(x) exp(-ik • x) d^x . (8) 

The power-spectrum of a particle distribution is then de- 
fined (see e.g., Peebles, 1980) as 



P(k) = -l(|^(k)p) 



(9) 



In point distributions which are statistically homogeneous, 
the power-spectrum and the non-diagonal part of the two- 
point correlation function ^(r) are a Fourier conjugate 
pair: 



^^'■^^ (2^)^ / '^'^^(k)exp(-*kr) 



(10) 
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k (h/Mpc) 

Fig. 1. Power-spectrum given by Eq[T2l The linear behav- 
ior at small k is reported as a reference. The amplitude at 
small k and the scale kc = 0.014 h/Mpc are chosen to be 
the same of the ACDM models discussed in what follows. 
The vertical lines indicates the wave-length kc- 



and 



P(k) = J dVf(r)exp(ikr) 



(11) 



Since for both ^(r) and -P(k) we consider only the depen- 
dence on the modulus of their arguments, we will denote 
them from now on as ^(r) and P{k) to mean that they are 
obtained by performing an average over the directions of 
r and k respectively. 

3. A toy model and the problem of sampling 

In order to illustrate some key features of standard cos- 
mological models, let us consider a simple matter density 
field power-spectrum of the type: 



P{k) = Afcexp(-fc/fcc) . 



(12) 



This is characterized by an amplitude A which fixes the 
small k behavior and by the turnover scale kc (see Fig[T]). 

As already mentioned, the two-point correlation func- 
tion is simply the Fourier transformation (FT) of the 
power-spectrum: for EqU^by using EqlHlwe find 



A 



(13) 



This correlation function presents the zero point at the 
intrinsic characteristic scale 



(14) 



At small scales r <C EqlT3] gives f (r) w const. > 0; 
while at large scales r ^ Tc the amplitude of ^(r) becomes 



10 Kr 
r (Mpc/h) 



Fig. 2. Absolute value of the two-point correlation func- 
tion given by Eq[T3]divided by ^(0). The (negative) power- 



law ; 



is shown as a reference. The vertical lines indicates 



the scale Vc- 

negative, going to zero for r —t oo with a power-law tail 
of the type ^(r) w — r^^ (see Figl2]). 

The region of positive correlation is thus followed by an 
(infinite) region where there are anti-correlations. Positive 
and negative correlations are exactly balanced so that 



^{r)r^dr = 



(15) 



This is equivalent to the condition that P{k) ^ for fc ^ 
0. As discussed in Gabrielli, Joyce, Sylos Labini (2002) (see 
also Gabrielli et al., 2004) this corresponds to the fact that 
the distribution is globally super-homogeneous, i.e. more 
ordered than an uncorrelated distribution (i.e. a Poisson). 
This subtle property can be clarified by computing the 
mass variance. 

To evaluate the mass variance (EqE]) one may choose 
as the volume of integration V a sphere in real space of 
radius R. In this case, going into Fourier space, Eq[6] be- 
comes (see e.g., Peebles, 1980) 



9 
2^ 



dkk^P{k) 



(sin(fci?) + (fci?) cos{kR)) 
{kRf 



.(16) 



By considering the power-spectrum given by Eq|12l one 
finds that cr^{R) ~ const, for R < Vc and a^{R) ^ R^'^ for 
R> Tc (see FiglH]). This fast decay of the mass variance is 
the distinctive feature of super-homogeneous mass distri- 
butions and it is strictly related to the condition P{Q) ~ 0. 
This is the fastest decay possible for any isotropic trans- 
lationally invariant distribution of points (see discussion 
in Gabrielli, Joyce and Sylos Labini, 2002). 

For a Poisson distribution one finds that the mass vari- 
ance decays slower than for a super-homogeneous distri- 
bution, i.e. a [R) ^ and that the power-spectrum 
obeys to 



lim P(k) 



const. > . 



(17) 
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10 10" 

r (Mpc/h) 



Fig. 3. Variance in real space spheres for Eq[T2l It is re- 
ported a line with slope r"** as reference. The vertical lines 
indicates the scale re- 

A similar situation occurs in the case the distribution has 
positive correlations at small scales and no correlations at 
large scales — a substantially Poisson distribution. On the 
other hand, in the presence of long-range positive corre- 
lations, as for example a power law correlation function 
^(r) ^ , with < 7 < 3, the mass variance decays 
slower than the Poisson case, i.e. a {R) - and 
the power-spectrum satisfies the condition 

lini P(/s) = 00 . (18) 



3.1. Sampling a density field 

What we have just described is a simple toy model power- 
spectrum which captures some essential features of the 
theoretical correlation properties of the matter density 
field. In the discussion of real galaxy samples, one has to 
consider that luminous objects trace the underlying dark 
matter density field and that they can be regarded as a 
sampling of it: for example they can be supposed to lie in 
the highest peaks of the fluctuations field, because only 
there gravitational clustering has been efficient enough to 
form self-gravitating objects. The problem of sampling is 
thus a central one in studies of cosmological density fields 
and, particularly, of galaxy structures. More precisely by 
sampling we mean the operation performed when one ex- 
tracts, from a given distribution, a subsample of it by using 
a selection criteria based on a certain parameter charac- 
terizing the distribution. For example, one can make such 
type of selection by extracting from the whole population 
of galaxies of all luminosity, only those objects whose lu- 
minosity is brighter than a given threshold; alternatively a 
similar selection can be done by considering galaxy color. 
In the case the fluctuation field is a stochastic variable 
of position (for example a Gaussian fluctuation field) one 
may sample the distribution by selecting only fluctuations 



larger than a given threshold in the density fluctuation 
field. 

In general the problem consists in the understanding of 
the relations between the statistical properties of the sam- 
pled, or biased, distribution with those of the original one. 
A particular interest lies in the relation between the two- 
point correlation function of the sampled field with the 
original ^(r). This is so because, for instance, in the stud- 
ies of galaxy samples, one naturally has to perform a sam- 
pling when measuring the two-point correlation function 
of galaxies of a certain luminosity. In the comparison of ob- 
servations with theoretical models the sampling procedure 
is strictly related to the physics of the system. In fact, in 
the analysis of cosmological N-body simulations one also 
needs to extract subsamples of points which, according 
to some models, would represent galaxies instead of dark 
matter particles. In these contexts, the simplest theoreti- 
cal model describing biasing (introduced by Kaiser, 1984) 
was developed for a continuous Gaussian field, and thus 
it does not represent an useful analytical treatment of the 
problem of strong clustering, which is instead the relevant 
one for galaxy structures. 

However it is very difficult to treat the problem of sam- 
pling for a generic case unless one may specify in detail 
the correlation properties of the original distribution and 
the specific procedure used to make the sampling. This 
is a task which is out of current knowledge even for the 
case of artificial distributions generated by gravitational 
N-body simulations where one can make a phenomenolog- 
ical approach. For this reason, we limit the discussion to 
the threshold sampling of the Gaussian random fields, be- 
cause this allows us to point out some key-features which 
characterize the case in which the underlying density field 
has super-homogeneous type correlations and the sam- 
pling is local (i.e. related to local features of the distri- 
bution). This cannot be regarded as a realistic example 
for the reasons discussed above, but one may identify sev- 
eral key problems which should be addressed in detail by 
means of studies of artificial distributions generated, for 
example, by N-body simulations for the understanding of 
a more realistic case. 



3.2. Sampling a Gaussian random field 

Let us now discuss the simplest biasing scheme of a con- 
tinuous and correlated Gaussian field (hereafter we follow 
Durrer et al., 2003). Suppose to have a Gaussian random 
field with two-point correlation ^(r) and such that the 
variance is (/i^) = (where is the mean density nor- 
malized fluctuation). One can identify fluctuations of the 
field such that they are larger than v times the variance. 
This selection defines a biased field with the weight equal 
to zero if the fluctuations of the original fleld are smaller 
than jx = va and equal to one if they are equal or larger 
than jl. When one changes the threshold v one selects dif- 
ferent regions of the underlying Gaussian random field, 
corresponding to fluctuations of differing amplitudes. The 
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Fig. 4. Absolute value of the correlation function of the 
toy model described by Eq[T2] (solid line) and of the ones 
corresponding to different values of the threshold param- 
eter V calculated by applying Eql20l The amplification is 
non- linear at small scales, where > 1, linear at large 

scales, and the zero-crossing scale is invariant under bias- 
ing. 

two-point correlation function of the selected objects is 
then that of the peaks (^fiir)- 

We define the two-point correlation function of the nor- 
malized field 



?(0)' 



(19) 



where ^(0) is the variance of the field so that ^(r) < 1 Vr. 
It is possible to compute the following first-order approx- 
imation (Durrer et al., 2003) 



1 - i{r) 



exp V 



1 + i{r) 



(20) 



which reduces to C/i(r) ~ i'^£,{r) when J^^|^(r)| ^ 1. Thus, 
if present in the underlying distribution, the characteristic 
length scale of the zero point Tc is not changed under this 
selection procedure, i.e. 



(21) 



On the other hand for ^p,{r) > 1 the amplification is non- 
linear as a function of scale: this means that the functional 
behavior of ^/i(r) is different from the one of ^(r) in the 
regime where ^p,{r) > 1. Fig|4] shows the situation when 
one takes the correlation function of the toy model dis- 
cussed in the previous section (see EqfT3|) as the ^(r) of 
the underlying Gaussian field. 

Given the asymmetrical amplification at small and at 
large scales the condition of super-homogeneity is broken, 



t^irydr > 



(22) 



Fig. 5. Power-spectrum of the toy model described by 
EqUH (solid line) and of the ones corresponding to dif- 
ferent values of the threshold parameter v calculated 
by applying Eg 1201 and then by making the Fourier 
Transformation. Because of the asymmetrical amplifica- 
tion of the correlation function at small and at large 
scale the condition of super-homogeneity is broken, i.e. the 
power-spectrum does not show anymore the tail P{k) ~ k. 

and thus the power-spectrum does not show anymore the 
tail -P(fc) ^ k (see FiglS]). Correspondingly the mass vari- 
ance shows the typical features of a substantially Poisson 
system beyond the scale Tc, i.e. it decays as r~^. 

Summarizing the behaviors for the toy model described 
by Eq[T2]we obtain that: 

— (i) the correlation function of the biased field still 
presents some key features of the original correlation 
function, namely the same characteristic scale Vc and 



the same negative tail ^{r) 



at large scales. 



— (ii) The power-spectrum is distorted in a non-linear 
way at all scales by biasing; in particular at large 
scales this is characterized by the typical behavior of 
a Poisson distribution. The same situation occurs for 
the mass variance. 

We expect these to be general features of the bi- 
ased fields when the underlying density field has super- 
homogeneous type correlations (Durrer et al. 2003, 
Gabrielli, et al., 2004). The cancellation of the super- 
homogeneous features is due to the fact that the operation 
of selection introduces a noise, due to the sampling itself, 
which dominates the intrinsic fluctuations of the system. 

4. Real space correlations in CDM-type models 

In this section we consider the case of a distribution with 
correlation properties of CDM type. In particular we study 
the case of the so-called ACDM "vanilla" model. The func- 
tional behavior and the parameters defining this model 
are discussed in Tegmark et al. (2004) and Spergel et al., 
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k (h/Mpc) 



Fig. 6. Power-spectrum for the ACDM model (Eq i24|) . 
The two power-laws P{k) ~ k and P{k) ^ fc^^ are shown 
as a reference. 
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Fig. 7. Absolute value of the two-point correlation func- 
tion for the ACDM model (EqlM]). The two power-laws 
r~^-^ and r^^ are shown as a reference. 



(2007). Without entering into the details of the model here 
we note that while the different cosmological parameters 
may change the behavior of the power-spectrum in a non- 
linear way, it is generally assumed that the bias factor b 
(we now indicate by b what in the previous section we 
have called v in order to make clear that the latter sym- 
bol refers only to the case of a correlated Gaussian field) 
corresponds to an overall rescale of its amplitude: 



P{k) = b^Pdr^ik) 



(23) 



where Pdmik) represents the power-spectrum of the under- 
lying dark matter field and P{k) is the "biased" power- 
spectrum, corresponding to the power-spectrum of a field 
selected by following a certain prescription. As discussed 
above Eg 1231 does not have any theoretical justification in 
the framework of Gaussian fields neither at small k nor 
at large k. Rather in numerical simulations it has been 
phenomenologically found that this is a good working hy- 
pothesis in the regime of strong clustering (Springel et al., 
2005). 

In order to compute the real space properties it is use- 
ful to find an analytical approximation to the theoretical 
power-spectrum which can be found numerically (we use 
hereafter the data from Tegmark et al., 2004). We have 
found that the following expression provides us with a 
good fitting formula 



P{k) = 



Ak 



{i + B{k/kiy^ + {k/k2 



(24) 



where A = 5 • 10^ B = lO^fci = 0.35 h/Mpc, vi = 
2.3, ki = 0.05 h/Mpc, V2 ~ 3.5. This power-spectrum 
is characterized by a turnover scale fcc ~ 0.014 h/Mpc 
which separates the large scales behavior P(A;) ~ k from 
the small scales one Pik) ~ fc~^. 

In this case it is not possible to calculate analytically 
the real space correlation function, but it can be obtained 



from the numerical computation of the Fourier transform 
of the power-spectrum by using Eq llOl The result is shown 
in FigO As for the case of the toy model discussed in the 
previous section, this correlation function is characterized 
by a positive region at small scales, where in this case it 
decays roughly as r~^-^, and by a large scale negative tail 



The length scale which separates these two 



regimes is the zero-point which represents the unique 
characteristic length scale of this model: for the parame- 
ters chosen in Eq[21]we find — 124Mpc/h. 

A reasonable fit to the correlation function obtained 
by making the FT is (see Fig[8]) 



^ (4 + ^^) 



r/3 



(25) 



where A = 5 • 10^ and k^ = 0.014h/Mpc and /3 = 1.4. 

It is interesting to note that if we compute the power- 
spectrum calculating the FT of the correlation function 
by using the analytical approximation given by Eql251 al- 
though the fit is very good over the all range of scales 
considered, we do not get the correct behavior at small 
wave-modes, i.e. that P{k) ~ fc for fc < k^. instead we get 
P{k) ~ const, for k < k^ (see Fig|9l). This is because the 
small approximation introduced in Eq |25l is such that the 
integral 



(,{r)r^dr > 



(26) 



and thus there is no the perfect cancellation between 
the positive and negative parts, i.e. the typical feature 
of super-homogeneous distributions, characterized by an 
extremely fine-tuning of the correlations. This simple 
example shows how sensible is the condition of super- 
homogeneity and gives a feeling of the kind of problems 
which can arise in the framework of sampling. In general. 
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Fig. 8. Correlation function for the ACDM model (EqlM]) 
and the approximation given by Eq |25l 
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Fig. 9. Power-spectrum for the ACDM model and the ap- 
proximation obtained from Eq llll by using Eq |25l 



the amplification of the correlation function due to selec- 
tion (or bias) is not linear and gives rise to a behavior like 
one just described, i.e. to the radical change of the super- 
homogeneous properties. That is, the distribution becomes 
substantially Poisson on scales larger than Tc because of 
the noise introduced by sampling, although the negative 
— tail in the correlation function is still present. 



A.l. Main features of the real space two-point 
correlation function 

As discussed above, the regime of large fluctuations 
^(r) > 1 is not predictable by a theoretical approach, and 
thus both the amplitude and the shape of the correlation 
function have to be constrained by observations. Any spe- 
cific model of matter density field however predicts the be- 
havior of the correlation function in the regime |^(r)| < 1. 



Fig. 10. Plot of the function ^(r) x r for the ACDM model 
(Eql24p and, for comparison for the case in which the am- 
plitude of Eq[21]as been multiplied by a factor 10 and a 
factor 1/10. The dashed lines correspond to the thresholds 
such that ^(r) = 0.01,0.001. 

We discuss, as an interesting example, the case of the 
ACDM model mentioned above. 

In general, it is possible to characterize the approach 
of the correlation function to the zero point, in a range 
of scales such that < ^(r). For the case of the ACDM 
model we get that in this range of scales a good and useful 
approximation is given by 



e(r) « A 



exp(— r/A) 



(27) 



where A — 3 ■ 10^^, A = 25Mpc/h and 7=1, while 
A — 3(0.03) when the amplitude of Eq[24]is multiplied by 
a factor 10 (1/10). The result is shown in Figlini The expo- 
nential cut-off, independent on bias, is related to the fact 
that ^(r) crosses zero at rc — 124 Mpc/h. Thus while the 
direct identification of the zero-point scale is clearly very 
difficult in a finite sample (see discussion below), for the 
effect of stochastic and systematic noise in the estimators, 
the approach to the zero point, in this model, is very well 
defined. In particular, the correlation function presents an 
exponential decay in the range of scales [10,100] Mpc/h. 
Depending on the value of the amplitude of ^(r), this range 
of scales is extended enough in the region where ^(r) > a 
with a > 10~^, thus a region where maybe observations 
will be provide with statistically robust samples, for a bias 
factor of order one for the parameters considered here. 

4.2. The Baryonic Bump 

As mentioned in the introduction, according to the physics 
of the early universe sound waves propagating in the first 
~ 400,000 years after the Big Bang produce an additional 
characteristic length scale in the matter and radiation den- 
sity fields. With galaxy surveys it would be possible to 
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Fig. 11. Absolute value of the two-point correlation func- 
tion for the ACDM model (Eql24|) and for the same model 
with the baryonic bump (BB) at ~ 100 Mpc/h. 



detect this acoustic feature as a bump in the correlation 
function at ~ 100 Mpc/h. The amplitude of this bump is 
controlled by the baryon density, the matter density and 
the Hubble constant (see Eisenstein et al., 2005 for a de- 
tailed discussion) . It is interesting to note that this bump 
corresponds to a non-analytical point of the correlation 
function which gives rise to a co-sinusoidal modulation 
for the power-spectrum (see Gabrielli et al., 2004). 

In FiglTT]we show a typical example. (In this the case 
the matter density is flm — 0.12/i^^ and the baryon den- 
sity is 57b — 0.024/i~^.) As one may notice from this figure 
the bump appears as a very small amplitude feature of the 
two-point correlation function localized at about ^ 100 
Mpc/h, i.e. when the correlation function shows the sharp 
break corresponding to the approaching to the zero point, 
which fixes the global shape of the correlation function at 
those scales. As we discuss below, one of the main prob- 
lems in the estimation of the correlation function at such 
scales in a given finite sample is to establish whether the 
break of the power-law behavior, that is the overall shape 
corresponding to the presence of the zero-point scale, is 
biased or not by a finite size effect. Once one can be sure 
enough that the shape is not affected by systematic ef- 
fects, then one may try to characterize the presence of the 
baryonic feature. 

5. Estimation of tlie correlation function 

Different estimators of the two-point correlation function 
have been introduced and discussed in the literature. The 
difference between them lies in their respective method 
of edge corrections (Kerscher, Szapudi and Szalay, 2000) 
which gives rise to different variance and systematic ef- 
fects or biases. We discuss three of them (i) the full-shell 
(FS) estimator (Gabrielh et al., 2004), (ii) the Landy and 
Szalay (LS) estimator (Landy and Szalay, 1993) and (iii) 



the Davis and Peebles (DP) estimator (Davis and Peebles, 
1983). The first one has the advantage that all biases can 
be carefully understood and possibly taken under control. 
The second is very popular because it has the minimal 
variance for the case of a Poisson distribution, although it 
has not been demonstrated that the same minimal vari- 
ance applies in case of correlated distributions (see e.g. 
Kerscher, Szapudi and Szalay, 2000). However it has the 
disadvantage that the biases are very poorly understood 
in the general case as in the case of the DP estimator. 
Although there have been several studies of these estima- 
tors (see e.g. Kerscher, 1999 and Kerscher, Szapudi and 
Szalay, 2000) systematic tests for biases are still not com- 
pletely developed. Here we give an introduction to the 
problem and analyze the case of the FS estimator while in 
the next section we try to quantify the problem by study- 
ing numerical simulations. 

Note that there are, at least, other three estimators 
known in the literature, the natural estimator, the Hewett 
estimator and the Hamilton estimator which are generally 
biased as the LS and DP estimators. In a detailed compar- 
ison between these estimators performed by Kerscher et 
al. (2000) it is reported that the performance of the LS es- 
timator is almost indistinguishable from the Hamilton es- 
timator. In addition Kerscher et al. (2000), after a careful 
study, have stressed that LS estimator is the recommended 
one. For this reason we decide to focus our studies on the 
LS while we have chosen the DP for the reason that it is 
commonly used in the literature. 

5.1. Bias in the estimators 



Let us call X{V) the statistical estimator of an average 
quantity {X) in a volume V (where {X) denotes the en- 
semble average and X the sample average). In order to 
be a valid estimator X{V) must satisfy (Gabrielli, et al., 
2004) 



lim X{V) = {X) . 



(28) 



A stronger condition is that the ensemble average of the 
estimator, in a finite volume V , is equal to the ensemble 
average {X): 



{X{V)) = (X) 



(29) 



An estimator is called unbiased if this condition is sat- 
isfied, otherwise there is a systematic bias in the finite 
volume relative to the ensemble average. Any estimator 
^(r) of the correlation function ^(r), is generally biased. 
This is because of the fact that the estimation of the sam- 
ple mean density is biased when correlations extend over 
the sample size and beyond. In fact the most common 
estimator of the average density is 



N 
V 



(30) 
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where N is the number of points in a sample of volume V. 
It is simple to show that (see, e.g., Gabrielli et al., 2004) 



1 



(31) 



Therefore only in case when ^(r) — (i.e. for a Poisson 
distribution) Eql30l is an unbiased estimator of the ensem- 
ble average density. 

In Kerscher (1999) one may find a detailed treatment 
of estimators of the two-point correlation function: it has 
been shown that in a given sample, on large scales, the 
biases in the above mentioned estimators are not negli- 
gible especially when there are structures of large spatial 
extension inside a given sample. In a ACDM models there 
are structures of large amplitude at small scales, i.e. up 
to ~ 10 Mpc/h, and structures of large spatial extension 
and low amplitude up to ^ 120 Mpc/h. Beyond such a 
scale there will be no structures anymore as the distri- 
bution becomes anti-correlated. Thus it is important to 
understand the problem of biases in relation to real sam- 
ple estimations, which may cover a distance scale of only 
several hundreds Mpc/h, i.e. up to about five times the 
regime of positive correlations. 

An analytical treatment of the problem, for the gen- 
eral case, is unfeasible and thus the most direct way to 
study biases in the estimators is by performing tests on 
artificial distributions, which we discuss in the next sec- 
tion. In what follows we present several examples which 
show the importance of the systematic effect related to 
EqlSB i-e. the fact that the estimators do not satisfy, in 
general, EqUHbut only Ea\M 

5.2. The full shell estimator 



The correlation function can be written as 



no 



(32) 



where the conditional density {n(r))p — {n{r)n(0)) /no 
gives the average number of points in a shell of radius 
r and thickness dr from an occupied point of the distri- 
bution. Thus FS estimator (Gabrielli et al., 2004) can be 
simply written as 



(33) 



where n is the estimated number density in the sample 
and (n(r))p is the estimator of the conditional density. 
The latter can be written as 



1 "^'^ AA^,(r,Ar) 



N^ir) AF 



(34) 



where AiV^ (r, Ar) is the number of points in the shell of 
radius r, thickness Ar and volume AF = 47rr^ Ar centered 
on the i'^ point of the distribution. Note that the number 
of points Nc{r) contributing to the average in Eq[33] is 
scale dependent, as there are considered only those points 



such that when chosen as a center of the sphere of radius r, 
this is fully included in the sample volume (see Gabrielli, 
et al., 2004, Vasilyev, Baryshev, Sylos Labini 2006, for 
more details). 

The sample density can be estimated in various ways. 
Suppose that the sample geometry is simply a sphere of 
radius i?,. The most convenient in this context is to choose 



{n{r))p4:nr'^dr , 



(35) 



as in this case the following integral constraint is satisfied 



£^{r)r'^dr = 



(36) 



This condition is satisfied independently on the functional 
shape of the underlying correlation function ^(r). 

The scale i?s, for a sample of arbitrary geometry, is 
given by the radius of the maximum sphere fully con- 
tained in the sample volume for the reasons explained 
above. Other choices for the estimation of the sample den- 
sity are possible and give rise to a condition of the type 
Eql361 even if not precisely the same. This condition in- 
troduces a systematic distortion in the measured shape of 
^(r) and the advantage in choosing EqlJS] lies in the fact 
that one has a certain control on the scale r* defined to be 
the scale beyond which the distortion becomes important. 
The scale r, must be evaluated given a specific model for 
^(r), but it is in general a fraction of Rg. 

Thus the integral constraint for the FS estimator, 
Eql361 does not simply introduce an offset, but a change in 
the functional behavior of the estimated correlation func- 
tion. Other choices introduce distortions at a scale which 
is difficult to be evaluated especially in the case the sam- 
ple does not have a simple spherical geometry. In general 
any estimator is distorted at some scales by a condition 
of the type given by Eql36l which basically reflects our 
ignorance on the value of the ensemble average density. 

In order to study the effect of the integral constraint 
for the FS estimator, let us rewrite the estimation of the 
correlation in terms of the theoretical correlation function 



e(r) = 



1 + 



1 



(37) 



By writing Eq |37l we assume that the stochastic noise is 
negligible, which of course is not a good approximation 
at any scale. However in this way we may be able to un- 
derstand the effect of the integral constraint for the FS 
estimator. From Eql37| it is clear that this estimator is 
biased, as it does not satisfy EqUHbut only Eq[^ 

Let us consider two useful examples for the theoreti- 
cal correlation function (i) ^(r) ~ r~'^ and in (ii) ACDM 
model of Eql^S] The distortion due to the integral con- 
straint in the FS estimator in the case the theoretical 
correlation function has a power-law behavior with ex- 
ponent 7 = 2 is illustrated in FiglT^l One may see that 
at r « i?s/3 the estimation is already distorted and when 
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Fig. 12. Absolute value of the estimation of the correla- 
tion function ^(r) ~ , with 7 = 2, by the FS estima- 
tor. The tick sohd hne represents the theoretical model. 
The condition given by the integral constraint described 
by Eql37]is taken into account: beyond the scale at which 
there is the break of the power-law behavior the correla- 
tion function crosses zero and becomes negative. 

r ~ Rg/2 the function ^(r) crosses zero and becomes neg- 
ative in order to satisfy Eg [551 

The case of the ACDM model is shown in Fig[T31 The 
situation is similar to the power-law case as long as one 
considers Rs smaller than the zero point scale Tc- For 
larger Rs one may see that zero point is not changed any- 
more, while the negative tail continues to be amplified in 
a non- linear way even at scales r < Rs. For example with 
a sample of size Rg « 600 Mpc/h the distortion of the 
power-law tail does not allow to detect the ^(r) ^ —r~^ 
behavior which is marginally visible only when Rs > 1000 
Mpc/h. 

5.3. Pairwise estimators 

To determine a pairwise estimator we define the following 
quantities. The number of data-data pairs 



DD{r) =^dd,{r,Ar) , 

i 

the number of data-random pairs 

i 

the number of random-random pairs 
RR{r) ^^rri{r,Ar) . 



(38) 



(40) 



where Nd is the number of data points, Nr is the number of 
random points, which are Poisson distributed, ddi{r, Ar), 



Fig. 13. Absolute value of the estimation of the correla- 
tion function of the ACDM model with the integral con- 
straint described by EqI371 The tick solid line represents 
the theoretical model. 

dr.i (r, Ar) and rr^ (r, Ar) are respectively the numbers of 
data-data, data-random and random-random pairs in the 
shell of radius r and thickness Ar around the i*'' center. 

The DP estimator is defined as (Davis and Peebles, 
1983)0 



^Dp{r) = 



Nr DD{r) 
Nd - 1 DR{r) 



- 1 



(41) 



The LS estimator is defined as (Landy and Szalay, 
1993) 



, NriNr-l)DD{r) Nr-lDR{r) 



Nd{Nd - 1) RRir) 



Nd RRir) 



+ 1 . (42) 



Finally the Hamilton estimator is defined as 
(Hamihon, 1993) 



NrNd 



DD{r)RR{r) 



{Nr-l){Nd-l) DR^ir) 



1 



(43) 



5.4. Errors 



The determination of measurement errors of the correla- 
tion function can be performed in various ways. This first 
is a calculation of the error on 1^ (r) in a given sample using 
the Poisson estimate (Ross et al., 2007) 



(39) cr|,(r) 



VDD{r) 



(44) 



The second error estimation method is the field-to-field er- 
ror, which is obtained by divining the whole sample into 



For seek of clarity hereafter we denote the estimator as ^xx 
where XX can be FS for the full-shell case, DP for the Davis 
and Peebles case and LS for the Landy and Szalay case. We 
omit the X symbol which was previously introduce to mean 
that this is an estimator of the statistical quantity X. 
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N subsamples and by computing in each of these the cor- 
relation function £,i{r) for i = 1...N 

i—l ^ ^ 

and ^(r) is the estimation of the correlation function in 
the whole sample. The third method is called jackknife 
estimate (Scranton et al., 2002, Zehavi et al., 2004) and 
the variance is estimated by 

'r'jaAr) - t^^{lA^-W)y (46) 

where the index i' is used to signify that each time the 
value of the correlation function (r) is computed in all 
subsamples but one (the i*'*). Finally another possibility 
is to divide the sample into N subfields, to compute the 
average 

T ^ 

^W = ]^E^«W (47) 

4=1 

and then the variance on the average 

<^air) = j;^ Jf^l ■ (48) 

4=1 

We show in what follows that Eq[35]is equivalent to EqHSl 
at all but the largest scales of a the sample where it gives 
a more conservative estimation of the errors. In what fol- 
lows we will make use of the errors estimated by Eg 1481 
which are similar to the jackknife ones fEq |46)) . Below we 
discuss in details the determination of the errors in artifi- 
cial distributions and a comparison between the different 
methods to define them. 

6. Test on artificial distributions 

We consider a distribution of points extracted from a cos- 
mological N-body simulation generated in framework of 
the Millennium project (Springel, et al. 2005), which con- 
sists of = 6, 528, 040 particles in a cubic box of nominal 
side L = 1 and which is one of the semi-analytic catalogs 
(Croton et al., 2006) constructed to produce mock galaxy 
samples. This distribution presents strong clustering up to 
a scale of Tq « 0.01 and then it presents weak power-law 
correlations up to the sample size. We compare the results 
of each estimator in the sub-boxes of varying size with the 
determination of the FS estimator in the box of side L = 1 
which we take as a reference. In principle, one would like 
to have a theoretical prediction to compare with: however 
due to the effect of the formation of non-linearities and 
to the sampling used to produce these distributions, one 
does not have a simple way to compute the theoretical cor- 
relation function. This is the reason why we have chosen 
the correlation function computed in the entire box as a 
reference. In addition, for all statistical quantities consid- 
ered, we limit our analysis to the scale Rg = 0.2 in order 
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Fig. 14. Average correlation function computed by using 
the FS estimator in independent sub-boxes of side £ = 0.15 
together with the prediction of Eql371 The solid line rep- 
resents the correlation function computed by using the FS 
estimator in independent sub-boxes of side £ = 0.5 while 
the dotted line (IC) represents the analytical computation 
of the estimated correlation with the integral constraint 
(i.e. EqEZl). 

to minimize finite size effects. In what follows we report 
the results by using the field-to-field average quantities 
and variance (i.e. Eq|47] and Eql48|) which we find to be 
the most conservative error determinations. Below we also 
present a discussion of the different determinations of the 
errors. 

6.1. Cubic Samples 

We have divided the box of side L — 1 into Nf non over- 
lapping sub-boxes of side £ = 0.05,0.1,0.15,0.2,0.25 and 
we have computed the correlation function in each of the 
sub-boxes. Note that the number sub-boxes over which the 
calculations are performed is taken to be constant inde- 
pendently on their size and Nf = 16. The determination 
of the correlation function by using the FS estimator is 
shown in Fig[T31 The main difference between the esti- 
mated correlation function and the "true" one is due to 
the integral constraint. This can be shown by the com- 
parison of the estimated correlation function with that 
computed by using Eql37l which describes the effect of the 
integral constraint. 

In Fig|15l we compare the determinations of the cor- 
relation function by the FS estimator in sub-boxes of dif- 
ferent sizes. One may note that the effect of the integral 
constraint, for what concerns the amplitude of the esti- 
mated correlation function, is important for the subsam- 
ples with £ < 0.1 as in this case the distribution is strongly 
non-linear inside the sample thus the determination of the 
sample density strongly depends on the sample size. This 
is shown by both a smaller amplitude and a smaller range 
of distance scales over which the correlation function is 




Fig. 15. Average correlation function computed by us- 
ing the FS estimator in independent sub-boxes of side 
I = 0.05, 0.1, 0.15, 0.2, 0.25 respectively. The sohd line rep- 
resents the correlation function computed by using the FS 
estimator in independent sub-boxes of side £ = 0.5. 



positive. The break in the positive behavior occurs at a 
distance scale of order £ independently on the amplitude 
of the correlation function. This is again a finite-size ef- 
fect which can be easily understood as due to the integral 
constraint. 

To summarize there are two distinct effects: (i) the 
amplitude of the estimated correlation function strongly 
depends on the sample size when the distribution exhibits 
strong clustering and (ii) the artificial break of the positive 
correlations is sample-size dependent. 

In Figs fTBlfTTl we compare the FS, DP and LS estima- 
tors. One may note that the DP and LS estimators are 
biased by a similar effect as the FS estimator, due to the 
integral constraint, although the break in the power-law 
behavior seems to occur at slightly larger scales than for 
the FS estimator. This difference can be attributed to the 
fact that the LS and DP estimators implicitly use the es- 
timations of the average density at scale £ instead of at 
the scale £/2 as the FS estimator. To clarify this point in 
the next section we present some other tests which have 
been tuned to explore this effect. 

In Fig[TH]we compare the LS and Hamilton estimators. 
We confirm the results of Kerscher et al. (2000) that the 
Hamilton and LS estimators give indistinguishable results, 
inside the error bars, and thus we will focus on the former 
hereafter. 

In Figs fT9ll20l we show the determinations of the av- 
erage correlation function computed by using the LS and 
DP estimators in independent sub-boxes of side £ =0.05, 
0.1, 0.15, 0.2, 0.25 respectively: the finite size dependen- 
cies of the amplitude and of the break are still present 
as for the FS estimator, and analogously to this former 
case, they can be understood as an effect of the integral 
constraint. 



Fig. 16. Average correlation function by using the FS, LS 
and DP estimator respectively in independent sub-boxes 
of side I = 0.05. The solid line represents the correlation 
function computed by using the FS estimator in indepen- 
dent sub-boxes of side £ = 0.5. 
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Fig. 17. Average correlation function computed by using 
the FS, LS and DP estimator respectively in independent 
sub-boxes of side t = 0.25. The solid line represents the 
correlation function computed by using the FS estimator 
in independent sub-boxes of side t = 0.5. 

6.2. Slices 

We have seen that the estimation of the correlation func- 
tion is affected by a finite size effect which depends on 
the sample size, which up to now has been considered 
to be a simple geometrical shape as a sphere or a cu- 
bic box. In order to investigate a situation closer to real 
observations we have constructed several subsamples of 
the original distribution in the following way. We have 
considered the observer placed in the center of the box 
(0.5,0.5,0.5) and we have identified a sphere of radius 0.5 
centered on that point. We have considered the spherical 
coordinates a, 6, r of the distribution points with respect 
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Fig. 18. Average correlation function by using the LS and 
Hamilton estimator respectively in independent sub-boxes 
of side I = 0.05. 
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Fig. 19. Average correlation function computed by us- 
ing the LS estimator in independent sub-boxes of side 
£ = 0.05, 0.1, 0.15, 0.2, 0.25 respectively. The solid line rep- 
resents the correlation function computed by using the FS 
estimator in independent sub-boxes of side t = 0.5. 

to such center, where < a < 27r, — 7r/2 < 5 < 7t/2 
and < r << 0.5. It is now possible to construct several 
subsample which have a certain depth Rdepth ^ 0.5 and 
specific cuts in a and d. In general the solid angle of a 
portion of a sphere is 

n = Aa X Afi , (49) 

where Aa = a2 — ai with ai,a2 the limits in right as- 
cension delimiting the angular region and A/i — sm{52) — 
sin(5i), with 5i, 62 the limits is declination delimiting the 
angular region. We have chosen A/x = 2, i.e. 61 = —tt/2 
and 62 = tt/2 and Aa =const. In such a way we have 
constructed Nf independent spherical slices with constant 
solid angle and same geometry. The number of slices is 
thus Nf = 2-K I Aa: we have taken N ^ < 30. We have then 
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Fig. 20. Average correlation function computed by us- 
ing the DP estimator in independent sub-boxes of side 
e = 0.05, 0.1, 0.15, 0.2, 0.25 respectively. The solid line rep- 
resents the correlation function computed by using the FS 
estimator in independent sub-boxes of side £ = 0.5. 
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Fig. 21. Average correlation function computed by using 
the FS, LS and DP estimator respectively in Nf — 30 an- 
gular slices with Aa — 0.0063. The solid line represents 
the correlation function computed by using the FS esti- 
mator in independent sub-boxes of side i — 0.5. 



computed the LS and DP estimators and their field-to- 
field variance fEgHB]). 

In Figs l^ni^ we show the average correlation function 
computed by using the LS and DP estimators respectively 
in Nf = 30 angular slices with Aa = 0.0063,0.013,0.063 
respectively. One may note that the LS and DP estima- 
tor are very similar although the LS estimator extends 
to slightly large scales. The amplitude in this case corre- 
sponds to the expectation value for the FS estimator in a 
box of side Rdepth = 0.05 which is about ten times larger 
than the radius of the maximum sphere fully included in 
the sample volume. 
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Fig. 22. Average correlation function computed by using 
the FS, LS and DP estimator respectively in Nf — 30 an- 
gular slices with Aa = 0.013. The solid line represents the 
correlation function computed by using the FS estimator 
in independent sub-boxes of side £ = 0.5. 



Fig. 24. Average correlation function computed by us- 
ing the LS estimator in Nf = 30 angular slices with 
Aa = 0.0063, 0.013, 0.063. The solid line represents the 
correlation function computed by using the FS estimator 
in independent sub-boxes of side £ — 0.5. 
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Fig. 23. Average correlation function computed by using 
the FS, LS and DP estimator respectively in Nf — 30 an- 
gular slices with Aa — 0.063. The solid line represents the 
correlation function computed by using the FS estimator 
in independent sub-boxes of side £ = 0.5. 



Fig. 25. Average correlation function computed by us- 
ing the DP estimator in Nf — 30 angular slices with 
Aa = 0.0063, 0.013, 0.063. The solid Hue represents the 
correlation function computed by using the FS estimator 
in independent sub-boxes of side £ — 0.5. 



By comparing (see Figs i24ll25|) the FS, LS and DP 
estimators computed in angular slices with Aa = 
0.0063, 0.013, 0.063 one may note that the amplitude 
slightly increases by choosing a larger solid angle and the 
range of scales where one may estimate the correlation 
function also increases when Aa increases. The exact loca- 
tion of the break of the power-law behavior and the value 
of the amplitude are in agreement with a value of Rdepth in 
integral constraint of the order of the sample depth £ and 
not of the radius of the maximum sphere fully enclosed as 
for the case of the FS estimator. 



In Figl^Hlwe finally show the average behavior of the 
LS estimator in Nf — 30 angular slices with Aa = 
0.063 and with a varying depth of the sample Rdepth = 
0.1,0.2,0.5. The finite size dependence of the amplitude 
and of the scale at which the break in the power-law be- 
havior occurs is clear. This represents an interesting test 
to be performed in the galaxy data as we discuss below. 

6.3. Determination of the errors 

In Fig 1 2 71 we show the behavior of the errors computed by 
EqHH EqHSl EqHHland EqHHl One may note that errors 
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Fig. 26. Average correlation function computed by us- 
ing the LS estimator in Nf = 30 angular slices with 
Aa — 0.063 and with a varying depth of the sample 
Rdepth = 0.1,0.2,0.5. The solid line represents the cor- 
relation function computed by using the FS estimator in 
independent sub-boxes of side £ = 0.5. 
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Fig. 27. Errors in the estimation of the correlation func- 
tion determined by EqlH (Poisson), EqgS] (FTF), EqlH 
(J) and EqgHKa). 



determined by the jackknife method EqHHl are approxi- 
matively the same as the ones computed by the field-to- 
field fluctuations EqO except at small scales where the 
jackknife method is more efficient giving smaller fluctua- 
tions (see discussion in Scranton et al., 2002 and Zehavi 
et al., 2002). On the other hand the jackknife error is 
greater than the two other estimators EqlU and EqHSl 
(see also Ross et al., 2006). Apart the small difference at 
small scales between Eq|46]and Eq|48]thc former are larger 
at scales comparable with the sample size and give a more 
conservative estimation of the fluctuations. 



6.4. Summary and discussion 

We have studied the finite size dependence of the esti- 
mated two-point correlation function by considering three 
different estimators the FS, the DP and the LS. We con- 
sidered the case of a point distribution presenting, on large 
enough scale in the sample, weak (^(r) < 1) power-law cor- 
relations. We have performed a series of tests to establish 
the role of the biases due to the integral constraint. This 
is the principal systematic effect which affects the behav- 
ior of the estimated correlation function at large scales, 
independently on the particular estimator considered. Let 
us briefly discuss our main results. 

We have first considered the determination of the cor- 
relation function in the cubic subsample of size £ < L ^ I, 
where L is the whole box size. We have constructed our 
estimation as an average over Nf disjointed sub-boxes. We 
have studied the behavior of the FS estimator as a func- 
tion of the size £ of the sub-boxes, finding a clear finite size 
dependence of both the amplitude (for small £) and of the 
length scale r* characterizing the break of the power-law 
behavior, beyond which the correlation function becomes 
negative. In agreement with a simple analytical study of 
the problem discussed in the previous section we found 
that r* ~ £/2. A similar situation occurs for the LS and 
DP estimators even though in this case r* ~ £. We note 
that the LS and DP estimators give very similar results 
over the whole range of scales. 

In order to understand in more detail the spatial ex- 
tension of the reliable measurements of two-point corre- 
lations provided by different estimators we have consid- 
ered samples with a geometry more similar to the case of 
real galaxy samples. Namely we have considered a sphere 
around the central point in the box of size L and divided 
it in Nf sub-samples with same solid angle ft. We also 
considered subsequent cuts in the depth £ < L. We found 
that the length-scale r* shows a dependence on $7 and it 
typically reaches a value of order of a fraction of £ which 
is larger than the scale Rs, up to which the FS estimator 
can be applied and which is of the order of the radius of 
the maximum sphere fully enclosed in the sample volume. 
We have then measured that the scale r* has a strong de- 
pendence on the value of £ as for the case of the simple 
cubic volumes considered in the previous test. 

It is important to note that the tests discussed here 
have been performed on a distribution which becomes 
uniform well inside the sample size. The above consid- 
erations on the performance of the various estimators can 
be easily verified for other distributions which satisfy the 
property of becoming uniform well inside a given sample 
and which show different correlation properties on large 
scales. However the situation is rather different for the 
case in which a distribution exhibits strong clustering in- 
side a given sample without a clear crossover toward a 
uniform distribution. In this case the best estimator is the 
most conservative one, i.e. the FS estimator as the estima- 
tion of the sample density is certainly biased at any scale 
as long as the distribution is characterized by strong non- 
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linear clustering (see discussion in, e.g. Gabrielli et al., 
2004 for the treatment of the strongly correlated case). 

This situation puts a serious warning on the determi- 
nation of the correlation at large scales in a given sample. 
If an estimator correlation function presents a break of, 
for example, the power-law behavior at a certain scale, 
the crucial test to be performed is to check whether this is 
a finite size or whether it is a true break. This situation is 
especially relevant for CDM-type correlations, for which 
the correlation function, according to theoretical models, 
should present a break from the small-scale power-law cor- 
relation at a scak' of order 124 Mpc/h. We will come back 
on this point in the conclusion. 

Finally we have considered different determinations of 
the errors of the estimators of the two-point correlation 
function. The more conservative way to estimate errors 
consists in the computation of the correlation function in 
disjointed regions and then to compute the average and 
the variance on the average: this method is less efficient 
than the jackknife method at small scales but gives similar 
results to that at large scales. 

7. Conclusions 

We have considered the real-space properties of CDM 
density fields, focusing in detail in a particular variant 
known as ACDM (vanilla) model. It is well known that the 
power-spectrum has typically a behavior P{k) ~ fc™ with 
— 1 < m < — 3 for large wavelengths k > kc, and P{k) ~ k 
at smaller wavelengths k < kc. We discussed that, corre- 
spondingly, the two-point correlation function shows ap- 
proximatively a positive power-law behavior ^(r) ~ 
at small scales r < Vc ~ k~^ and a negative power-law 
behavior ^(r) oc — at large scales r > Vc, where the 
zero-crossing occurs at about Vc ~ 124 Mpc/h in the 
model considered. We discussed the fact that, globally, 
a system with this type of correlations belong to the cat- 
egory of super-homogeneous distributions, which are con- 
figurations of points more ordered than a purely uncor- 
related (Poisson) distribution. Correspondingly fluctua- 
tions are depressed with respect to the Poisson case, and 
the normalized mass variance, for instance, decay faster 
(c7^(r) ~ r"**) than for the Poisson case (o'^(r) ~ r~^). 
The condition of super-homogeneity is expressed by the 
condition that P{k) ^ for fc ^ 0, or alternatively that 

/■OO 

/ ^{r)r^dr = . 
Jo 

Following the work of Durrer et al. (2003) we have 
pointed out that the above condition is broken when one 
samples the distribution, as for example when the simplest 
biasing scheme of correlated Gaussian fields (introduced 
by Kaiser, 1984) is applied. This is particularly impor- 
tant for the behavior of the power-spectrum for k < k,., 
which, under biasing, remains constant instead of going as 
P{k) ~ k. The correlation function at large scales r > Vc 
is instead expected to be linearly amplified with respect 
to the original one of the whole matter field. Thus the 



large scale negative tail ^(r) ~ —r~'^ is the main feature 
which one would like to detect in order to test theoretical 

models. 

Given the fact that when ^(r) becomes negative, it 
is characterized by a very small amplitude, the determi- 
nation of the negative power-law tail represents a very 
challenging problem. We have discussed the fact that, at 
first approximation in a real measurement, one may treat 
the system as having positive correlations at small scales 
with an exponential cut-off at the scale Tc and then it be- 
comes uncorrelated (a situation which can be regarded as 
upper limit to the presence of anti-correlations). This im- 
plies that for Tc > 124 Mpc/h galaxy distribution should 
not present any positive correlation. Whether this behav- 
ior is compatible with the existences of structures of order 
200 Mpc/h or more is an open problem which has to be 
addressed in the studies of forthcoming galaxy catalogs. 

More in detail, one of the most basic results (see e.g., 
Peebles 1980) about self-gravitating systems, treated us- 
ing perturbative approaches to the problem (i.e. the fluid 
limit), is that the amplitude of small fluctuations grows 
monotonically in time, in a way which is independent of 
the scale. This linearized treatment breaks down at any 
given scale when the relative fluctuation at the same scale 
becomes of order unity, signaling the onset of the "non- 
linear" phase of gravitational collapse of the mass in re- 
gions of the corresponding size. If the initial velocity dis- 
persion of particles is small, non-linear structures start to 
develop at small scales first and then the evolution be- 
comes "hierarchical", i.e., structures build up at succes- 
sively larger scales. Given the finite time from the initial 
conditions to the present day, the development of non- 
linear structures is limited in space, i.e., they can not be 
more extended than the scale at which the linear approach 
predicts that the density contrast becomes of order unity 
at the present time. This scale is fixed by the initial ampli- 
tude of fiuctuations, constrained by the cosmic microwave 
background anisotropics (Spergel et al., 2007), by the hy- 
pothesized nature of the dominating dark matter compo- 
nent and its correlation properties. According to current 
models of CDM-type the scales at which non-linear clus- 
tering occurs at the present time (of order 10 Mpc) are 
much smaller than the scale Tc « 124 Mpc/h (see e.g. 
Springel et al., 2005). Thus the region where the super- 
homogeneous features should still be in the linear regime, 
allowing a direct test of the initial conditions predicted 
by early universe models. The scale Vc marks the maxi- 
mum extension of positively correlated structures: beyond 
Tc the distribution must be anti-correlated since the be- 
ginning, as there was no time to develop other correla- 
tions. The possible presence of structures, which mark 
long-range correlations, whether or not of large amplitude, 
reported both by observations of galaxy distributions (like 
the Sloan Great Wall sec Gott et al., 2005), by the de- 
tection of dark matter distributions (see e.g. Massey et al., 
2007) and by the large void of radius ~ 140 Mpc identified 
by Rudnick et al. (2007), is maybe indicating that positive 
correlations extend well beyond rc- 
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We have discussed that an important finite size effect 
must be considered when estimating the correlation func- 
tion, and which may mimic a break of the power-law be- 
havior similar to the ones of CDM models at a scale of 
order Vc- This is related to the effect of the integral con- 
straint in the estimators, namely the fact that the sample 
average, estimated in a finite sample, differs from the en- 
semble average, and can be finite-size dependent. This sit- 
uation occurs when correlations (weak or strong) extend 
to scales larger than the sample size. 

For these reasons, in order to study the two-point cor- 
relation function in real galaxy samples when its ampli- 
tude becomes smaller than unity, it is crucial to check 
whether the break of the power-law behavior has a finite 
size dependence or not, by choosing samples with differ- 
ent depth. In this perspective the assessment of the real- 
ity of the break of the two-point correlation function is 
the main observational point to be considered. Once this 
will be clarified other features should be considered, as for 
the example the so-called baryonic bump, which is a very 
small perturbation to the overall shape of the correlation 
function at scales of order of the zero-point Vc- We will 
present a detailed analysis of the correlation properties of 
galaxy distribution in the SDSS catalog, considering spe- 
cific tests for finite-size effects in the determination of the 
correlation function, in a forthcoming paper. 
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